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Abstract 

We propose a new stochastic algorithm (generalized simulated an- 
nealing) for computationaUy finding the global minimum of a given 
(not necessarily convex) energy /cost function defined in a continuous 
D-dimensional space. This algorithm recovers, as particular cases, 
the so called classical ( "Boltzmann machine" ) and fast ( "Cauchy ma- 
chine") simulated annealings, and can be quicker than both. 
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1 INTRODUCTION 



The central step of an enormous variety of problems (in Physics, Chemistry, 
Statistics, Neural Networks, Engineering, Economics) is the minimization of 
an appropriate energy/cost function defined in a D-dimensional continuous 
space (x G R^). If the energy is convex (single minimum), any gradient 
descent method easily solves the problem. But if the energy is nonconvex 
(multiple extrema) the solution requires more sophisticated methods, since a 
gradient descent procedure could easily trap the system in a local minimum 
(instead of one of the global minima we are looking for). This sophistication 
must necessarily involve possible "hill climbings" (for detrapping from local 
minima), and can be heavily computer-time-consuming. Consequently, var- 
ious algorithmic strategies have been developed along the years for making 
this important problem increasingly tractable. One of the generically most 
efficient (hence popular) methods is simulated annealing, to which this pa- 
per is dedicated. In this technique, one or more artificial temperatures are 
introduced and gradually cooled, in complete analogy with the well known 
annealing technique frequently used in Metallurgy for making a molten metal 
to reach its crystalline state {global minimum of the thermodynamical en- 
ergy). This artificial temperature (or set of temperatures) acts as a source 
of stochasticity, extremely convenient for eventually detrapping from local 
minima. Near the end of the process, the system hopefully is inside the 
attractive basin of the global minimum (or in one of the global minima, if 
more than one exists, i.e., if there is degeneracy) , the temperature is practi- 
cally zero, and the procedure asymptotically becomes a gradient descent one. 
The challenge is to cool the temperature the quickest we can but still having 
the guarantee that no definite trapping in any local minimum will occur. 
More precisely speaking, we search for the quickest annealing (i.e., in some 
sense approaching a quenching) which preserves the probability of ending in 
a global minimum being equal to one. The first nontrivial solution along 
this line was provided in 1983 by Kirkpatrick et al |]T| for classical systems, 
and was extended in 1986 by Ceperley and Alder for quantum systems. 
It strictly follows quasi-equilibrium Boltzmann-Gibbs statistics. The system 
"tries" to visit, according to a visiting distribution assumed to be Gaussian 
(i.e., a local search distribution) in the neighborhood of its actual state x. 
The jump is always accepted if it is down hill (of the energy/cost funtion); 
if it is hill climbing, it might be accepted according to an acceptance proba- 
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bility assumed to be the canonical-ensemble Boltzmann-Gibbs one. Geman 
and Geman showed, for the classical case, that a necessary and suffi- 
cient condition for having probability one of ending in a global minimum is 
that the temperature decreases logarithmically with time. This algorithm is 
sometimes referred to as classical simulated annealing (CSA) or Boltzmann 
machine. We easily recognize that, if instead of decreasing, the temperature 
was maintained fixed, this procedure precisely is the well known Metropolis 
et al 1^ one for simulating thermostatistical equilibrium. 

The next interesting step along the present hne was Szu's 1987 proposal p 
of using a Cauchy-Lorentz visiting distribution, instead of the Gaussian one. 
This is a semi-local search distribution: the jumps are frequently local, but 
can occasionally be quite long (in fact, this is a Levy-flight- like distribution). 
The acceptance algorithm remains the same as before. As Szu and Hartley 
showed, the cooling can now be much faster (the temperature is now allowed 
to decrease like the inverse of time), which makes the entire procedure quite 
more efficient. This algorithm is referred to as fast simulated annealing (FSA) 
or Cauchy machine. 

The goal of the present work is to generalize both annealings within an 
unified picture which is inspired in the recently Generalized Statistical Me- 
chanics IP, (see also [@, P]), with the supplementary bonus of providing an 
algorithm which is even quicker than that of Szu's. In Section 2, we briefly 
review this generalized thermostatistics, describe the optimization algorithm 
and prove that, if the cooling rithm is appropriate, the probability of end- 
ing in a global minimum equals one. In Section 3, we numerically discuss a 
simple D = 1 example. Finally, we conclude in Section 4. 

2 GENERALIZED SIMULATED ANNEAL- 
ING (GSA) 

Inspired by multifractals, one of us proposed a generalized entropy Sq as 
follows 

S, = k (gG^) (1) 

where {pi\ are the probabilities of the microscopic configurations and k is 
a conventional positive constant. In the g — *• 1 limit, Sq recovers the well 
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known Shannon expression —ks'^ pilnpi. Optimization of this entropy for 

i 

the canonical ensemble yields 

P. ^ (2) 

with 

^ 5: [1-/3(1 -g)E,]T^ (3) 

i 

where (3 = 1/kT is a Lagrange parameter, and {Ei} is the energy spectrum. 
We immediately verify that, in the q 1 limit, we recover Boltzmann-Gibbs 
statistics, namely Pi = exp{—PEi)/Zi with Zi = ^ exp(— 

i 

Let us now focus the acceptance probability Pq^{xt a^t+i), where t is 
the discrete time {t = 1, 2, 3, ■ ■ ■) corresponding to the computer iterations. 
For the Boltzmann machine {qa = 1) we have, for example, the Metropolis 
algorithm |Q : 

p _ ^ X _ / 1 if Eixt+i) < E{xt) 

F^^xt xt+i) - I ^[ij(^,)-i^(.Vi)]/Tf^W if E{xt+^) > E{xt) 

where T{^(t) is the = 1 acceptance temperature at time t {k = 1 from 
now on). We see that T^{t) = +0 implies Pi = 1 if E{xt+i) < E{xt), and 
Pi = if E{xt+i) > E{xt). These limits are important for the asymptotic 
stabilization of the annealing process, and we will require them to be satisfied 
by the generalized form. Eq.(4) satisfies detailed balance. A generalization 
of Eq.(4) that also satisfies the detailed balance condition, and asymptoti- 
cally generates states distributed according to Eq.(2), must involve the ratio 
of terms of the form 1 — /3(1 — q)E. Nevertheless, we could not find a gen- 
eralization along this line which satisfies the T = limits mentioned above. 
Instead, we worked with a form that generalizes Eq.(4), satisfies the limits at 
T = 0, and goes to an equilibrium distribution, although generically different 
from that of Eq.(2). This generalized acceptance probability reads: 

r 1 if E{xt+i) < E{xt) 

Pg^{xt xt+i) = I 1 ^ if E{xt+i) > E{xt) 

y [l+{QA-^){E{xt+i)^E(xt))/TA^] 

(5) 
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Although it is possible to work under generic conditions, for simplicity we 
shall assume here that E{x) > (Vx). Moreover, we shall assume that 
Q'A > 1, so T^^it) can decrease down to zero without any type of singularities. 
Within these hyphothesis, Pg^ G [0, 1] (VgA), and, for T^^{t) decreasing from 
infinity to zero, Pg^ monotonically varies from 1 to if E{xt+i) > E{xt), and 
equals 1 whenever E{xt+i) < E{xt). 

We can now focus the Xt Xt+i isotropic visiting distribution gq^^Axt) 
where Axt = (xt+i — Xt). It satisfies 



rfpp^-V(p) = i (6) 

JO 

where = DTl^^^ /T (^y + 1^ is the D-dimensional complete solid angle. 
For the Boltzmann machine {qy = 1) we have [Q, 



gi{Axt) oc exp 



TY{t) 



(7) 



where {t) is the gy = 1 visiting temperature at time t. Using condition (6) 
we obtain 



For the Cauchy machine {qy = 2) we have P| 

g2(Axt) oc '^^^ jj-r (9) 

where T2{t) is the gy = 2 visiting temperature at time t. The functional 

form of Eq. (9) is the D-dimensional Fourier transform of exp{— T2^(t)|y|} 
(see 0). Using condition (6) we obtain 

g^^Axt) = -^j^ ^ (10) 

^ ^ vr^ mitW + iAx,)^}"^ ^ ^ 



Within the present scheme, a natural proposal for unifying (8) and (10) is 

\[Tiit)Y + (qv - mAx.r} 



9,vi^^t) = c-—-— .^^'^''"^^^^1. .^.^ (11) 
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where a, b, c, d and e are {qv, -D)-dependent pure numbers to be determined. 
Using condition (6) and recalling that Axt may carry dimensions (e.g, [length]) 
we immediately establish that 

To further decrease the number of independent pure numbers to be deter- 
mined, let us address a central point, namely the fact that the method has to 
guarantee that, at the t ^ oo limit, the system must be at a (jf/o 6a/ minimum. 
For this to occur (see [^] and references therein) the state visiting must be 

oo 

"infinite often in time (iot)", which indeed occurs if ^ ggy{Axto) diverges 

t=to 

for fixed Axt^ with to >> 1- Under these conditions we have that 

oo oo 

E g,,{Ax,,) <x Y.K{t)r (13) 

t=to t=to 

We know § that, for arbitrary D, if) = T^^{1) ln2/ ln(l +t) and T^it) = 
T2^(l)/t, which are conveniently unified with 

^ 2qv-i __ I 

Tqy{t) = ^9v(l)^YTt)9^-^^ 

Replacing (14') into Eq. (13) we obtain 

oo oo 

E 9gvi^Xt,) OC E ^(^^JTTjrf (15) 

t=to t=to 

For arbitrary D and qv = 1, 2 it is {qv — l)d = 1. We assume, for 
simplicity, that the same holds Vgy, hence 

d=^— (yqv,WD) (16) 

qv-l 

consequently the series (15) is the harmonic one, hence diverges (logarithmi- 
cally) as desired. If we use Eqs. (12) and (16) into (11) we obtain 

-D 



1 + (gy - 1)6 ^ 



gqvi^^t) = c- '-^^ (17) 
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For qv = 1, Eq. (17) must recover Eq. (8), hence 6=1 and a = 1 (for 
arbitrary D). For qv = 2, Eq. (17) must recover Eq. (10), hence 6 = 1 and 
a = ^^Y^ (for arbitrary D). For simphcity we assume 

b = 1 (Vgv, VD) (18) 

Finally, condition (6) univocally determines the normalizing pure number c 
as a function of the rest of the free parameters. Using this and Eq. (18) into 
Eq. (17) yields 



TT 



r 



l + {qv- 1) 



(Axt)2 



(19) 

where only one undetermined pure number (namely a{qv,D)) is now left. 
It satisfies, as already mentioned, a{l,D) = 1 and a{2,D) = {D + l)/2. 
Although more general forms are possible, we shall adopt the simplest qy- 
dependence, namely a linear interpolation, hence 

a = l + ^^Y^ (qv - 1) (Vgy, WD) (20) 

Replacing this into Eq. (19) we otain our final visiting distribution 



gqy{Axt) = ( — — ) '/ 1 -[y- 1 (Vgv, V^) 



1 + (gy - 1) 



(21) 

The second moment of this distribution diverges for qy > 5/3, and the 
distribution becomes not normalizable for > 3. 

There is no particular reason for Tj^ being equal to but, following 
[|], we shall use here the simplest choice, i.e., T^(t) = T^^(t), Vt (given by 
Eq. (14)). We can now summarize the whole algorithm for finding a global 
minimum of a given energy/cost function E{x): 

(i) Fix {qA,(lv)- Start, at t = 1, with an arbitrary value xi and a high 
enough value for Tg^(l) (say about 2 times the height of the highest 
expected "hill" of E{x), and calculate -E(xi); 



7 



(ii) Then randomly generate Xt+i from Xt according to Eq. (21) to determine 

the size of the jump Axt, and isotropically determine its direction; 

(iii) Then calculate E{xt+i): 

If E{xt+i) < E{xt), replace Xt by Xt+i; 

If E{xt+i) > E{xt), run a random number r G [0, 1]: if r > Pq^ given 
by Eq. (5) with T^{t) = T^^{t), retain xt; otherwise, replace Xt by 

(iv) Calculate the new temperature Tj^ using Eq. (14) and go back to (ii) 
until the minimum of E{x) is reached within the desired precision. 

3 A SIMPLE D=l ILLUSTRATION 

In this Section we numerically treat a simple D = 1 example with a double 
purpose: on one hand to exhibit how the procedure works and, on the other, 
to find for which pair [qy, g^) the algorithm is the quickest. (We recall that 
(qvylA) = (1, 1) corresponds to CSA and (2,1) to FSA). 
We choose the same example treated in |Q, namely 

E{x) = x^- I6x^ + 5x + Eo (22) 

where we have introduced the additive constant Eq ~ 78.3323 so that E{x) > 
0, Vx, thus satisfying the convention adopted below Eq. (5); see Fig. 1. As 
initial conditions for all of our runs we used Xi = 2 and Tgy = 100. In Fig. 
2 we can see typical runs for = 1.1 and different values of qv- Clearly 
the case qv = 2.5 is much faster and precise than classical and fast anneal- 
ings {qv = 1.1 ~ 1 and 2 respectively). To study the (qv^QA) influence on 
the quickness of the algorithm we have adopted once for ever, an arbitrary 
convergence criterium. For each [qv^QA) pair we evaluate the mean value 
of Xt in intervals of 100 time steps. Whenever two successive intervals pre- 
sented mean values whose difference was smaller than a precision e = 10^, we 
stopped the run. We then evaluated the total iteration time r and repeated 
the whole annealing procedure 10 times. Finally, we compute the average 
total calculation time < r >. The {qy, qA) dependence of the average < r > 
is presented in Fig. 3 for typical values of qA- Fig. 3 indicates that machines 
with = 1.1 and qy = 2.9 are typically 5 times faster than the Cauchy ma- 
chine , which is in turn about 5 times faster than the Boltzmann machine 
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[|I|, 1^, ^. We have done our simulations on a 486 DX microcomputer with 
a clock of 50 MHz. In this machine each one of the 10 solutions demanded, 
approximately, 1 minute and 20 seconds of CPU time for qy = 2, and only 
20 seconds for qy = 2.5. Finally in Fig. 4 we present the dependence of 
< r > with qA for qy = 2; for this case the Cauchy machine ^ is the best 
performant. These results indicate that the quickest algorithm occurs for 
= 1 and qy = 3. 

4 CONCLUSION 

Inspired in a recent generalization of Boltzmann-Gibbs statistical mechanics, 
we have heuristically developed a generalized simulated annealing (character- 
ized by the parameters {qy,qA)) which unifies the so called classical (Boltz- 
mann machine; qv = Qa = ^) and fast (Cauchy machine; qy/2 = qA = 1) 
ones. This computational method is based on stochastics dynamics (which 
asymptotically becomes, as time runs to infinity, a gradient descent method), 
and enables, with probability one, the identification of a global minimum of 
any (sufficiently nonsingular) given energy/cost function which depends on a 
continuous D-dimensional variable x. While the discrete time t increases, it 
might happen that Xt provisionally stabilizes on a given value, and eventually 
abandons it running towards the global minimum. This temporary residence 
can be used, as bonus of the present method, to identify some of the local 
minima. If sufficiently many computational runs are done by starting at ran- 
dom initial positions {xi}, this property could in principle be used to identify 
all the local minima as well as all the global ones. 

For simplicity, we have mainly discussed herein the restricted region qy > 
1 and g^i > 1 (with E{x) > 0, Vx), and have identified the {qy, qA) — (2.9, 1) 
machines as the most performant ones in practical terms. This algorithm 
has been illustrated herein with a simple two-minima D = 1 energy function, 
and has already been successfully used |jlO[ for recovering the global energy 
minima (with respect to the dihedral angle) of a variety of simple molecules 
(e.g., CHsOH, H2O2, C2HQ). It should be very interesting to test the present 
generalized algorithm with many-body systems presenting a great number 
of minima (spin-glass-like frustrated systems, traveling salesman problem , 
neural networks, complex economic systems). 

We acknowledge N. Caticha for stressing our attention onto Szu's algo- 
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rithm [Q, as well as K.C Mundim and A.M.C. de Souza for very useful 
discussions. At the time of submission of this paper we took notice that 
this algorithm (informally communicated by us to a few people) has already 
been succesfuUy implemented for the Traveling Salesman Problem |jTl], |1^], 
for fitting curves |lF 



and for a problem of genetic mutations [|15 
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Caption for figures 



Fig. 1: D — 1 energy function E{x) — x'^ — IGx^ + 5x + Eq vs. x for Eq — 
78.3323; global minimum = —2.90353 and E{xl) = 0; maximum: 
x*2 = 0.156731 and E{x*2) = 78.7235; local minimum: x^ = 2.74680 and 
E{xl) = 28.2735. 

Fig. 2: Typical runs of the GSA algorithm Xt vs. t (annealing time) for 
initial conditions Xi = 2, Tg^(l) = 100. All four runs correspond to 
qa = 1.1; a) qv = 1.1, b) qy = 1.5, c) qy = 2, d) qy = 2.5. Notice the 
different scales for the ordinates. 

Fig. 3: Average total calculating time < r > vs. qy for two typical values 
of (A: qA = 1.5; 0: = l-l)- 

Fig. 4: Average total calculation time < r > vs. qA for qy — 2. 
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